What is claimed is: 

1. A Single Instruction Multiple Data (SIMD) processor for 
executing SIMD instructions, comprising: 

a decoding unit operable to decode an instruction; and 
5 an execution unit operable to execute the instruction based 

on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes 
an instruction for making a judgment on comparison results of a 
SIMD compare instruction executed on a plurality of data elements, 
10 judges whether the obtained comparison results are all the same or 
not among the plurality of data elements, and generates a judgment 
result. 

2. The SIMD processor according to Claim 1, 

is wherein the execution unit judges whether the comparison 

results are ail zero or not, and generates a judgment result. 

3. The SIMD processor according to Claim 1 further comprising a 
flag storage unit operable to store a flag, 

20 wherein the execution unit stores, into the flag storage unit, 

the comparison results of the SIMD compare instruction, together 
with the generated judgment result. 

4. A processor that is connected to an external memory, 
25 comprising: 

a register for storing data; 

a decoding unit operable to decode an instruction; and 
an execution unit operable to execute the instruction based 
on a result of the decoding performed by the decoding unit, 
30 wherein the execution unit, when the decoding unit decodes 

an instruction for storing a value held in a register into the external 
memory, stores a least significant byte of a higher half word and a 
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least significant byte of a lower half word into the external memory, 
out of word data made up of 4 or more bytes stored in the register. 

5. The processor according to Claim 4, 
5 wherein the execution unit stores the least significant byte of 

the higher half word and the least significant byte of the lower half 
word into storage locations specified by contiguous addresses in the 
external memory. 

io 6. A processor that is connected to an external memory, 
comprising: 

a register for storing data; 

a decoding unit operable to decode an instruction; and 
an execution unit operable to execute the instruction based 
15 on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes 
an instruction for storing values held in a first register and a second 
register into the external memory, stores the following data into the 
external memory: a least significant byte of a higher half word and 
20 a least significant byte of a lower half word out of word data made up 
of 4 or more bytes stored in the first register; and a least significant 
byte of a higher half word and a least significant byte of a lower half 
word out of word data made up of 4 or more bytes stored in the 
second register. 

25 

7. The processor according to Claim 6, 

wherein the execution unit stores the following data into 
storage locations specified by contiguous addresses in the external 
memory: the least significant byte of the higher half word and the 
30 least significant byte of the lower half word in the first register; and 
the least significant byte of the higher half word and the least 
significant byte of the lower half word in the second register. 



-51 - 



8. A processor for decoding and executing instructions, 
comprising : 

a register for storing data; 
5 a decoding unit operable to decode an instruction; and 

an execution unit operable to execute the instruction based 
on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes 
an instruction for storing data in at least one higher digit of the 
10 register, stores the data only in said at least one higher digit of the 
register without changing a value in a storage location other than 
said at least one higher digit. 

9. The processor according to Claim 8, 

15 wherein the register has a storage location for storing 1 word 

data, and 

the execution unit stores the data in a higher half word of the 
register. 

20 10. A SIMD processor for executing SIMD instructions, 
comprising : 

a flag storage unit operable to store a first flag; 

a decoding unit operable to decode an instruction; and 

an execution unit operable to execute the instruction based 
25 on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes 
an instruction for performing a SIMD operation, the instruction 
including operands specifying a first register and a second register, 
performs the SIMD operation ( i ) only on the operand held in the 
30 first register when the first flag stored in the flag storage unit 
indicates a first status, and ( ii ) on the operands held in the first 
register and the second register when the first flag indicates a 
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second status. 

11. The SIMD processor according to Claim 10, 
wherein the SIMD operation is addition, and 

5 the execution unit adds ( i ) a value held in the first register 

and said value held in the first register when the first flag indicates 
the first status, and ( ii ) the value held in the first register and a 
value held in the second register when the first flag indicates the 
second status. 

10 

12. The SIMD processor according to Claim 11, 

wherein the execution unit, when two pieces of data al and a2 
are stored in the first register and two pieces of data bl and b2 are 
stored in the second register, calculates ( i ) (al+al) and (a2+a2) 
15 when the first flag indicates the first status, and ( ii ) (al + bl) and 
(a2+b2) when the first flag indicates the second status. 

13. A SIMD processor for executing SIMD instructions, 
comprising : 

20 a flag storage unit operable to store a flag; 

a decoding unit operable to decode an instruction; and 
an execution unit operable to execute the instruction based 
on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes 
25 an instruction for performing a SIMD operation, the instruction 
including operands specifying a first register and a second register, 
performs the SIMD operation ( i ) only on the operand held in the 
first register and rounds an operation result when the flag stored in 
the flag storage unit indicates a first status, and ( ii ) on the 
30 operands held in the first register and the second register and 
rounds an operation result when the flag indicates a second status. 
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14. The SIMD processor according to Claim 13, 
wherein the SIMD operation is addition, and 

the execution unit adds ( i ) a value held in the first register 
and said value held in the first register, and adds 1 to an addition 
5 result when the flag indicates the first status, and ( ii ) the value held 
in the first register and a value held in the second register, and adds 
1 to an addition result when the flag indicates the second status. 

15. The SIMD processor according to Claim 14, 

io wherein the execution unit, when two pieces of data al and a2 

are stored in the first register and two pieces of data bl and b2 are 
stored in the second register, calculates ( i ) (al+al + 1) and 
(a2+a2+l) when the flag indicates the first status, and ( ii ) 
(al + bl + 1) and (a2+b2+l) when the flag indicates the second 

15 status. 



16. The SIMD processor according to one of Claims 10~12, 
wherein the flag storage unit further stores a second flag, and 
the execution unit determines whether to round the operation 

20 result or not depending on a value of the second flag. 

17. The SIMD processor according to one of Claims 11, 12, 14 and 
15, 

wherein the execution unit further divides the operation 
25 result by 2. 



18. A SIMD processor for executing SIMD instructions, 
comprising : 

a decoding unit operable to decode an instruction; and 
30 an execution unit operable to execute the instruction based 

on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes a 
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SIMD instruction for generating a value according to a sign of each 
of a plurality of data elements, generates data indicating that each 
of the plurality of data elements is one of a positive value, zero, and 
a negative value. 

5 

19. The SIMD processor according to Claim 18, 

wherein the execution unit generates 1, 0, and -1 depending 
on whether each of the plurality of data elements is a positive value, 
zero, or a negative value. 

10 

20. The SIMD processor according to Claim 19, 

wherein the SIMD instruction includes a specification of a first 
register storing the plurality of data elements and a second register 
storing the data generated by the execution unit, and 
15 the execution unit stores 1, 0, and -1 into a plurality of 

storage locations in the second register by associating said storage 
locations with a plurality of storage locations in the first register 
storing the plurality of data elements. 

20 21. A SIMD processor for executing SIMD instructions, 
comprising: 

a parameter specification unit operable to specify a first 
parameter and a second parameter; 

a decoding unit operable to decode an instruction; and 
25 an execution unit operable to execute the instruction based 

on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes 
an instruction on first data, performs a bit-shift on the first data 
according to the first parameter, and outputs a plurality of word data 
30 at word positions identified by the second parameter, out of 
obtained shifted data. 
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22. The SIMD processor according to Claim 21, 

wherein the execution unit, when the shifted data includes 
contiguous first ~ third word data, generates ( i ) two pieces of the 
first word data and two pieces of the second word data in this order 
5 when the second parameter indicates a first status, and ( ii ) one 
piece of the first word data, two pieces of the second word data, and 
one piece of the third word data in this order when the second 
parameter indicates a second status. 

10 23. The SIMD processor according to Claim 21, 

wherein the execution unit, when the shifted data includes 
contiguous first ~ fourth word data, generates ( i ) two pieces of the 
first word data and two pieces of the second word data in this order 
when the second parameter indicates a first status, and ( ii ) one 

15 piece of the first word data, one piece of the third word data, one 
piece of the second word data, and one piece of the fourth word data 
in this order when the second parameter indicates a second status. 

24. The SIMD processor according to one of Claims 21~23, 

20 wherein the first parameter and the second parameter are 

flags. 

25. The SIMD processor according to one of Claims 21~23, 
wherein the word is byte. 

25 

26. A processor for decoding and executing instructions, 
comprising : 

a decoding unit operable to decode an instruction; and 
an execution unit operable to execute the instruction based 
30 on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes 
an add instruction including operands specifying first data and 
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second data, generates ( i ) a result obtained by adding the first 
data, the second data, and 1 when the first data is zero or positive, 
and ( ii ) a result obtained by adding the first data and the second 
data when the first data is negative. 

27. The processor according to Claim 26, 

wherein the first data is an object of absolute value rounding, 

and 

the second data specifies a digit in the first data to be an 
object of absolute value rounding. 

28. The processor according to Claim 27, 

wherein the second data is a value in which a digit 
corresponding to the digit in the first data to be an object of absolute 
value rounding is 1 and other digits in the first data are 0. 

29. A processor for decoding and executing instructions, 
comprising: 

a plurality of registers; 

a decoding unit operable to decode an instruction; and 
an execution unit operable to execute the instruction based 
on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes 
an instruction for an operation on a first register and a second 
register, stores a result of the operation on the first register into a 
third register, and stores a result of the operation on the second 
register into a fourth register contiguous to the third register. 

30. A processor for decoding and executing instructions, 
comprising: 

a flag storage unit operable to store a plurality of flags used 
as predicates of a condition execution instruction; 
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a decoding unit operable to decode an instruction; and 
an execution unit operable to execute the instruction based 
on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes a 
loop branch instruction including an operand specifying a flag, 
branches to a top of a loop, and makes a setting of the flag. 

31. The processor according to Claim 30, 

wherein the flag is used as a predicate of one of an EPILOG 
instruction and a PROLOG instruction in a case where the loop is 
unrolled through software pipelining. 

32. The processor according to Claim 30, 

wherein the plurality of flags are specified as operands in the 
branch instruction, and 

the execution unit performs the branch and a transfer among 

the plurality of flags. 

33. The processor according to Claim 32, 

wherein the plurality of flags are used as predicates of an 
EPILOG instruction, a KERNEL instruction and a PROLOG instruction 
in a case where the loop is unrolled through software pipelining. 

34. A processor for decoding and executing instructions, 
comprising: 

a branch register for storing a branch target address; 

a flag storage unit operable to store a plurality of flags used 
as predicates of a condition execution instruction; 

a decoding unit operable to decode an instruction; and 

an execution unit operable to execute the instruction based 
on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes a 
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store instruction including an operand specifying a flag, the store 
instruction for storing the branch target address in the branch 
register, stores a top address of a loop into the branch register, and 
makes a setting of the flag. 

35. The processor according to Claim 34, 

wherein the flag is used as a predicate of one of an EPILOG 
instruction and a PROLOG instruction in a case where the loop is 
unrolled through software pipelining. 

36. The processor according to Claim 34, 

wherein the plurality of flags are specified as operands in the 
store instruction, and 

the execution unit performs the storage and makes settings of 
the plurality of flags when the store instruction is decoded. 

37. The processor according to Claim 36, 

wherein the plurality of flags are used as predicates of an 
EPILOG instruction, a KERNEL instruction and a PROLOG instruction 
in a case where the loop is unrolled through software pipelining. 

38. A SIMD processor for executing SIMD instructions, 
comprising: 

a decoding unit operable to decode an instruction; and 
an execution unit operable to execute the instruction based 
on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes a 
SIMD instruction for determining a sum of absolute value 
differences between a plurality of data pairs, generates a value 
obtained by adding absolute value differences between each of the 
plurality of data pairs. 
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39. The SIMD processor according to Claim 38, 

wherein the SIMD instruction includes a specification of first 
data in addition to the plurality of data pairs, and 

the execution unit generates a value obtained by adding the 
first data to the value obtained by adding the absolute value 
differences between each of the plurality of data pairs. 

40. The SIMD processor according to Claim 38, 

wherein the plurality of data pairs are specified by two 
registers, and 

the execution unit determines absolute value differences 
between each of byte data pairs in the two registers, and generates 
the value by adding all the absolute value differences. 

41. A processor for decoding and executing instructions, 
comprising: 

a decoding unit operable to decode an instruction; and 
an execution unit operable to execute the instruction based 
on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes a 
saturation instruction including operands specifying first data and 
second data, generates ( i ) a saturated value when the first data is 
larger than the saturated value identified by the second data, and 
( ii ) the first data when the first data is equal to or smaller than the 
saturated value. 

42. The processor according to Claim 41, 

wherein the first data and the saturated value are signed 
values. 

43. The processor according to Claim 41, 

wherein the second data specifies a digit where saturation is 
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performed. 



44. The processor according to Claim 42, 

wherein the second data is a value in which a digit larger than 
a digit corresponding to the saturated value is 1, and in which a digit 
that is equal to or smaller than the digit corresponding to the 
saturated value is 0. 

45. A processor for decoding and executing instructions, 
comprising: 

a plurality of "n"-word-long registers; 

a decoding unit operable to decode an instruction; and 

an execution unit operable to execute the instruction based 
on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes 
an instruction for selecting word data on a word-by-word basis, the 
instruction including operands specifying first ~ third registers and 
one parameter, stores an "n" piece of word data selected by the 
parameter into the third register, out of 2"n" pieces of word data 
stored in the first register and the second register. 

46. The processor according to Claim 45, 

wherein the parameter is a value stored in a fourth register. 

47. The processor according to Claim 45, 
wherein the parameter is an immediate value. 

48. The processor according to Claim 45, 

wherein the parameter includes a flag indicating whether or 
not the "n" piece of word data is stored individually into each of "n" 
locations in the third register, and 

the execution unit selectively stores or not store the w n" piece 
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of word data into the third register according to the flag. 

49. The processor according to one of Claims 45~48, 
wherein the word is byte. 

50. A SIMD processor for executing SIMD instructions, 
comprising: 

a decoding unit operable to decode an instruction; and 

an execution unit operable to execute the instruction based 

on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes a 

SIMD instruction, generates a plurality of operation results by 

performing a SIMD operation, and performs bit extension on at least 

one of the plurality of operation results. 

51. The SIMD processor according to Claim 50, 

wherein two half words are stored in a word-long register as 
the operation results, and one of said two half words is extended to 
word data in the SIMD instruction. 

52. The SIMD processor according to Claim 50, 

wherein two half words are stored in a word-long register as 
the operation results, and each of said two half words is extended to 
word data in the SIMD instruction. 

53. A SIMD processor for executing SIMD instructions, 
comprising: 

a flag storage unit operable to store a flag; 
a decoding unit operable to decode an instruction; and 
an execution unit operable to execute the instruction based 
on a result of the decoding performed by the decoding unit, 

wherein the execution unit, when the decoding unit decodes 
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an instruction for performing a SIMD operation on a plurality of data 
pairs, performs a SIMD operation identified by the flag stored in the 
flag storage unit on each of the plurality of data pairs. 

5 54. The SIMD processor according to Claim 53, 

wherein the flag storage unit stores a first flag and a second 

flag, 

the instruction includes a specification of a first data pair and 
a second data pair, and 
10 the execution unit performs an operation indicated by a value 

of the first flag on the first data pair, and an operation indicated by 
a value of the second flag on the second data pair. 

55. The SIMD processor according to Claim 53, 
15 wherein the flag storage unit stores a first flag and a second 

flag, 

the instruction includes a specification of a data pair, and 
the execution unit performs an operation indicated by a value 
of the first flag on the data pair, and an operation indicated by a 
20 value of the second flag on the data pair. 



25 
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